physical world
Roundtables: Can AI Learn to Understand the World?
Watch a subscriber-only discussion exploring how AI might enter the physical world. AI companies want to build systems that understand the external world and overcome the limitations of LLMs. Recent developments have brought world models to the forefront of the AI discussion. Watch a conversation with editor in chief Mat Honan, senior AI editor Will Douglas Heaven, and AI reporter Grace Huckins exploring how AI might enter the physical world. A woman's uterus has been kept alive outside the body for the first time Jessica Hamzelou Want to understand the current state of AI? Check out these charts. Want to understand the current state of AI? Check out these charts.
Full-Distance Evasion of Pedestrian Detectors in the Physical World
Many studies have proposed attack methods to generate adversarial patterns for evading pedestrian detection, alarming the computer vision community about the need for more attention to the robustness of detectors. However, adversarial patterns optimized by these methods commonly have limited performance at medium to long distances in the physical world. To overcome this limitation, we identify two main challenges. First, in existing methods, there is commonly an appearance gap between simulated distant adversarial patterns and their physical world counterparts, leading to incorrect optimization. Second, there exists a conflict between adversarial losses at different distances, which causes difficulties in optimization. To overcome these challenges, we introduce a Full Distance Attack (FDA) method. Our physical world experiments demonstrate the effectiveness of our FDA patterns across various detection models like YOLOv5, Deformable-DETR, and Mask RCNN.
Prediction with Action: Visual Policy Learning via Joint Denoising Process
Diffusion models have demonstrated remarkable capabilities in image generation tasks, including image editing and video creation, representing a good understanding of the physical world. On the other line, diffusion models have also shown promise in robotic control tasks by denoising actions, known as diffusion policy. Although the diffusion generative model and diffusion policy exhibit distinct capabilities--image prediction and robotic action, respectively--they technically follow similar denoising process. In robotic tasks, the ability to predict future images and generate actions is highly correlated since they share the same underlying dynamics of the physical world. Building on this insight, we introduce \textbf{PAD}, a novel visual policy learning framework that unifies image \textbf{P}rediction and robot \textbf{A}ction within a joint \textbf{D}enoising process. Specifically, PAD utilizes Diffusion Transformers (DiT) to seamlessly integrate images and robot states, enabling the simultaneous prediction of future images and robot actions.
Interview with AAAI Fellow Yan Liu: machine learning for time series
Each year the AAAI recognizes a group of individuals who have made significant, sustained contributions to the field of artificial intelligence by appointing them as Fellows. Over the course of the next few months, we'll be talking to some of the 2026 AAAI Fellows . In this interview, we met with Yan Liu, University of Southern California, who was elected as a Fellow . We found out about how time series research has progressed, the vast range of applications, and what the future holds for this field. Could you start with a quick introduction to your area of research?
The Good Robot podcast: what makes a drone "good"? with Beryl Pong
The Good Robot podcast: what makes a drone "good"? Hosted by Eleanor Drage and Kerry McInerney, The Good Robot is a podcast which explores the many complex intersections between gender, feminism and technology. What makes a drone "good"? In this episode, we talk to Beryl Pong, UKRI Future Leaders Fellow at the University of Cambridge, where she leads the Centre for Drones and Culture. Beryl reflects on what it means to think about drones as "good" or "ethical" technologies and how it can be assessed through its socio-political context.
AI enables a Who's Who of brown bears in Alaska
AI enables a Who's Who of brown bears in Alaska Being able to distinguish individual animals - including their unique history, movement patterns and habits - can help scientists better understand how their species function, and therefore better manage habitats and study population dynamics. Today, most computer vision systems for tracking animals are effective on species with patterns and markings, such as zebras, leopards and giraffes. The task is much more complicated for unmarked species where individual differences are harder to spot. Distinguishing a particular brown bear from its peers in a non-invasive way requires an incredible eye for detail and years of viewing the same bears over time. What's more, these bears emerge from hibernation in the spring with shaggy fur and having lost quite a bit of weight and then substantially increase their body weight feasting on salmon, as well as fully shedding their winter coat - that's enough to throw off experts as well as AI algorithms.
Learning to see the physical world: an interview with Jiajun Wu
What is your research area? My research topic, at a high level, hasn't changed much since my dissertation. It has always been the problem of physical scene understanding - building machines that see, reason about, and interact with the physical world. Besides learning algorithms, what are the levels of abstraction needed by Al systems in their representations, and where do they come from? I aim to answer these fundamental questions, drawing inspiration from nature, i.e., the physical world itself, and from human cognition.
How can robots acquire skills through interactions with the physical world? An interview with Jiaheng Hu
How can robots acquire skills through interactions with the physical world? One of the key challenges in building robots for household or industrial settings is the need to master the control of high-degree-of-freedom systems such as mobile manipulators. Reinforcement learning has been a promising avenue for acquiring robot control policies, however, scaling to complex systems has proved tricky. In their work SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL, and introduce a method that renders real-world reinforcement learning feasible for complex embodiments. We caught up with Jiaheng to find out more.
Yann LeCun's new venture is a contrarian bet against large language models
Yann LeCun's new venture is a contrarian bet against large language models In an exclusive interview, the AI pioneer shares his plans for his new Paris-based company, AMI Labs. Yann LeCun is a Turing Award recipient and a top AI researcher, but he has long been a contrarian figure in the tech world. He believes that the industry's current obsession with large language models is wrong-headed and will ultimately fail to solve many pressing problems. Instead, he thinks we should be betting on world models--a different type of AI that accurately reflects the dynamics of the real world. He is also a staunch advocate for open-source AI and criticizes the closed approach of frontier labs like OpenAI and Anthropic. Perhaps it's no surprise, then, that he recently left Meta, where he had served as chief scientist for FAIR (Fundamental AI Research), the company's influential research lab that he founded. Meta has struggled to gain much traction with its open-source AI model Llama and has seen internal shake-ups, including the controversial acquisition of ScaleAI. LeCun sat down with in an exclusive online interview from his Paris apartment to discuss his new venture, life after Meta, the future of artificial intelligence, and why he thinks the industry is chasing the wrong ideas.